Insights into Machine Learning: Data Clustering and Classification Algorithms for Astrophysical Experiments
نویسندگان
چکیده
Data analysis domain dealing with data exploration, clustering and classification is an important problem in many experiments of astrophysics, computer vision, bioinformatics etc. The field of machine learning is increasingly becoming popular for performing these tasks. In this thesis we deal with machine learning models based on unsupervised and supervised learning algorithms. In unsupervised learning category, we deal with Self-Organizing Map (SOM) with new kernel function. The data visualization/exploration and clustering capabilities of SOM are experimented with real world data set problems for finding groups in data (cluster discovery) and visualisation of these clusters. Next we discuss ensembling learning, a specialized technique within the supervised learning field. Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets. They grow multiple learner algorithms and combine them for getting better accuracy results. Generally decision trees learning algorithm is used as base classifiers to grow these ensembles. In this thesis we experiment with Random Forests (RF) and Back-Propagation Neural Networks (BPNN) as base classifiers for making ensembles. Random Forests is a recent development in tree based classifiers and quickly proven to be one of the most important algorithms in the machine learning literature. It has shown robust and improved results of classifications on standard data sets. We experiment the working of the ensembles of random forests on the standard data sets available in University of California Irvine (UCI) data base. We compare the original random forest algorithm with their ensemble counterparts and discuss the results. Finally we deal the problem of image data classification with both supervised (ensemble) learning and unsupervised learning. We apply the algorithms developed in the thesis for this task. These image data are taken from the MAGIC telescope experiment, which collects the images of particle rays coming from the outer universe. We apply the ensembles of RF, BPNN for making a supervised classification of images and compare the performance results. Then we discuss a SOM system, developed for making an automatic classification of images using the unsupervised techniques.
منابع مشابه
An improved opposition-based Crow Search Algorithm for Data Clustering
Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملDiagnosis of Heart Disease Based on Meta Heuristic Algorithms and Clustering Methods
Data analysis in cardiovascular diseases is difficult due to large massive of information. All of features are not impressive in the final results. So it is very important to identify more effective features. In this study, the method of feature selection with binary cuckoo optimization algorithm is implemented to reduce property. According to the results, the most appropriate classification fo...
متن کاملClassification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کامل